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1.  Introduction 


For  simplicity,  let  us  describe  the  problem  in  two  dimensions.  Assume  that 
we  have  a  physical  variable  Z(x,y)  in  some  two-dimensional  region.  Thus  Z(x,y)  may 
be  ozone  concentrations  in  the  San  Francisco  Bay  Area  region.  Assume  also  that  this 
random  field  Z(x,y)  is  measured  at  fixed  points  Xj  ,y,-  ,/=l,2,.../t .  The  problem  consists  of 
estimating  Z{x,y)  at  arbitrary  points  in  the  region  from  the  given  data. 

Nearest  neighbor  regression  (NNR)  functions  are  useful  for  estimating  the 
value  of  Z(x,y).  Nadaraya  (1964)  and  Watson  (1964)  first  independently  proposed  a 
NNR  method  which  behaves  well  even  when  only  little  information  on  the  distribution  of 
the  measured  data  points  is  known.  Stone  (1980)  showed  that  non-parametric  regression 
estimators  of  this  type  have  uniform  rate  of  convergence.  Stute  (1984)  showed  that  if  the 
£[Z2]  is  finite,  these  estimates  are  asymptotically  normal.  The  same  year  Cheng  (1984) 
showed  that  if  the  noise  about  the  true  regression  has  finite  variance  these  estimates  are 
uniformly  consistent.  Stute  (1984)  also  showed  that  these  estimates  are  more  efficient 
than  kernel  estimators  when  there  are  a  limited  number  observations  in  the  neighborhood 
of  the  estimated  point. 

Air  Quality  is  monitored  in  the  United  States  by  monitoring  apparatus  of 
state,  local,  and  federal  networks.  From  1974  to  1976,  5777  monitoring  stations  existed. 
These  air  quality  data  consist  of  discretely  measured  points  which  are  not  on  a  regular 
grid.  Details  regarding  these  data  are  given  in  (Johnson, 1983).  Section  2  introduces 
NNR  as  it  is  applied  to  these  data.  Cross-validation  is  used  for  choosing  the  smoothing 
parameter.  Section  3  introduces  a  concept  called  local  variance  and  its  relationship  to  the 
necessary  amount  ot  smoothing  for  a  set  of  data.  A  measure  of  local  variance  as  a  func- 
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tion  of  window  size  is  also  introduced  in  this  section.  Section  5  and  6  finds  the  window 
size  for  which  mean  local  variance  is  minimized  for  three  states  and  two  pollutants.  The 
window  size  is  then  related  to  the  smoothing  parameter.  The  choice  of  smoothing 
parameter  given  by  cross-validation  is  compared  to  that  given  by  the  minimum  mean  lo¬ 
cal  variance  method. 

2.  The  Nearest  Neighbor  Regression  Method 

Let  Z,  be  the  information  from  data  points  positioned  at  (x,  ,y, ).  For  exam¬ 
ple,  the  value  of  Z,  is  the  measured  pollutant  level  at  latitude  Xj  and  longitude  y, .  Let 
(Xpyp )  be  the  point  of  interest  of  a  neighborhood  P.  Z,  ’s  may  or  may  not  be  located  in  P . 
Define  d,-  as  the  euclidean  distance  of  (x,  ,yf- )  to  (xp  ,yp ).  If  d,  is  large  we  want  Zr  to  be 
less  influential  in  the  estimate  of  Zp  than  if  it  were  small.  Let 
m  (xp  yp)~E  (Zp  /xp,ypJZ  1^2, '  ’  '  Zn)  ■  are  required  to  construct  an  estimate  of 
m  (xp  ,yp ).  The  proposed  estimate  is 

m*(xp,yp)=  ^X(di)Z, 

where  X.(d, )  is  based  on  an  appropriate  weight  function  and  the  Mdi )  sum  to  one.  More 
specifically, 
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where  c a{di)  =  e~°-5di2/d^  .  Since  some  stations  are  more  active  than  others:  if  pi  is  the 
percentage  of  time  the  station  located  at  (jc;  ,_y, )  was  active,  then  we  use 
a y(di)=pie-°-5dW 


The  weights  are  dependent  on  distance  such  that  they  are  relatively  large  for 


| 

5* 


| 


;* 

y, 

8 

•* 


measurements  within  a  relatively  small  radius  of  the  estimated  point.  The  size  of  d0 
determines  how  relatively  large  the  weights  are.  Weights  are  large  for  large  da  and  small 
for  small  d0 .  As  d0  increases  the  amount  of  smoothing  increases.  Figure  1.  gives  shows 
the  shape  of  the  weight  functions  for  constant  pi .  Each  curve  corresponds  to  a  different 
d0  which  covers  1,  2,  5,  10,  20,  and  50  km.  Note  that  the  flattest  curve  in  Figure  1.  is  for 
d0  =50km  and  thus  gives  the  greatest  amount  of  smoothing;  while  the  steepest  curve  is 
for  d0=ikm  and  gives  the  least  amount  of  smoothing. 

These  weights  were  used  in  the  Populations  at  Risk  to  Environmental  Pollu¬ 
tion  (PAREP)  project  for  estimating  pollution  concentration  at  the  population  centroid  in 
various  geographic  areas.  According  to  Merrill  (1982),  these  weights  were  chosen  for 
three  reasons  "(i)  the  estimated  function  would  be  smooth  in  the  vicinity  of  the  estimated 
points;  (ii)  the  estimated  function  need  not  pass  through  all  measured  points;  (iii)  the  area 
integral  of  the  estimated  function  should  be  finite,  so  that  distant  points  can  be  ignored  in 
the  calculation." 

To  avoid  producing  estimates  from  poorly  monitored  stations,  some  con¬ 
straints  in  estimation  are  used.  The  constraints  are  as  follows.  Let  C*(i),y(i))  denote  the 
nearest  data  point  to  (xp  ,yp ).  It  follows  that  d(\)  is  the  minimum  distance  of  a  data  point 
to  (xp  ,yp ). 

If  d(i)  ^  3 da  exclude  Uu),y(1))  and  conclude  that  Z„  is  inestimable. 


If  <  3 d0  then  all  Zr  within  4 d0  will  be  used  to  estimate  Zp . 


The  4 dQ  criterion  has  been  constructed  to  avoid  sharp  discontinuities  at  the  midpoint 
between  predicted  points  which  are  exactly  6 da  apart.  This  is  especially  important  for 
estimating  a  surface  for  which  this  method  may  also  be  used. 

One  drawback  to  this  method  is  that  these  estimates  are  easily  biased  to  clus¬ 
ters  of  data  points.  If  the  point  of  estimation  is  equidistant  from  10  clustered  points  on 
one  side  and  1  point  on  the  other  side,  the  estimate  using  this  procedure  will  be  dominat¬ 
ed  by  the  10  clustered  points  when  the  information  from  the  lone  point  on  one  side  may 
be  more  valuable  since  it  is  the  only  measured  point  on  that  side.  A  weight  which  is  in¬ 
versely  proportional  to  the  number  of  points  within  a  specific  distance  of  the  observed 
point  could  be  added  to  account  for  the  clustering.  Thus,  points  in  clusters  have  less 
influence  than  those  lone  points  on  the  estimate.  Regardless  of  whether  the  weighting  is 
by  distance  or  cluster  presence  a  smoothing  parameter  to  determine  the  amount  of 
smoothing  would  be  included  and  thus  methods  for  choosing  the  appropriate  smoother 
are  still  applicable. 


2.2  Cross-Validation  to  Determine  the  Amount  of  Smoothing 


To  choose  the  "optimum"  value  of  d0 ,  a  pollution  estimate  for  each  station 
location  was  produced  from  observations  at  other  nearby  stations  excluding  the  station’s 
own  observed  value.  This  estimate  is  then  compared  to  the  actual  observed  value  at  the 
station  location  of  the  estimate  for  various  d„ ’s.  By  varying  da ,  we  can  choose  those  es¬ 
timates  which  minimize  the  prediction  error.  This  method  is  called  cross-validation  and 
is  nearly  identical  to  the  application  in  Wahba  and  Wold  (1975).  However,  they  used 
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only  one  measure  of  prediction  error  and  in  this  paper  four  measures  are  considered.  The 
idea  to  leave  out  one  data  point  out  at  a  time  and  see  how  well  the  various  estimates  fit 
the  observed  values  is  described  fully  in  Stone  (1974)  or  Geisser  (1975). 

Prediction  error  is  defined  as  the  cross-validatory  estimate  minus  the  ob¬ 
served  value.  Wahba  and  Wold  (1975)  use  the  mean  of  the  squared  prediction  errors  and 
call  this  the  cross-validation  mean  square  error  technique.  This  technique  is  used  to  esti¬ 
mate  from  the  data  the  appropriate  degree  of  smoothing.  In  this  paper,  several  composite 
measures  of  prediction  error  are  used. 

Since  any  composite  friction  of  the  prediction  errors  is  heavily  biased  by 
outliers,  the  technique  should  be  used  vigilantly.  Although,  the  possibility  of  this  bias  is 
not  explored  here,  many  points  are  estimated  and  compared  so  that  the  influence  of  1  or  2 
outliers  should  not  be  substantial. 

The  composite  prediction  errors  were  calculated  as  follows.  Let 

z,  =  logarithm  of  the  pollutant  concentration  at  location  (xf-  ,yr ). 
m*  =  estimate  of  Z (jc,  ,y, )  from  surrounding  data  points. 
n  =  the  number  of  stations  that  could  be  estimated. 

Pi  =  the  percentage  of  time  stations  i  was  active. 

The  concentrations  are  in  geometric  means  which  are  assumed  to  have  a  log¬ 
normal  distribution.  Thus,  the  logarithm  of  these  values  is  distributed  normally  for  large 
n.  Therefore  when  observed  or  estimated  values  are  discussed  they  are  in  units  of  the 
logarithm  of  the  geometric  mean  concentration. 

The  reliability  of  this  estimate  is  to  some  extent  a  function  of  the  station  ac¬ 
tivity  as  well  as  the  estimation  procedure.  For  these  error  functions  to  reflect  the  adequa- 
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cy  of  the  estimation  procedure  and  not  the  station  activity  they  are  weighted  according  to 
the  percentage  of  time  active. 


The  sample  mean  value  of  the  pollutant  level,  Z  is 


The  sample  variance  of  all  the  values  of  pollutant  level,  Z  is 


The  weighted  variance  of  these  values  is 
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The  following  error  measures  have  been  considered. 


(MSE)  Mean  Squared  Error 


MSE  = 


tpi-mW 


(WMSE)  Weighted  Mean  Squared  Enoi 


WMSE  = 


tfiki-m*)2 


(RMSE)  Ratio  of  Mean  Squared  Errors 
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(WRMSE)  Weighted  Ratio  of  Mean  Squared  Errors 


WRMSE  = 


tPi  S,l  « 


(MAE)  Mean  Absolute  Error,  Percent 


MAE  =  -L  T  \zi-m* !  x  100; 
nz  r=\ 


(WMAE)  Weighted  Mean  Absolute  Error,  Percent 


WMAE  =  — ^ —  ^\pi  \zi~m*\  x  100; 
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(MPE)  Mean  Percent  Error 


MPE 


_  4,  Pi  \  Zj -m*  I 


x  100 


(WMPE)  Weighted  Mean  Percent  Error 


WMPE  = 


Pi  \zj—m*\ 
Zi 


x  100 


These  formulas  are  similar  to  those  used  by  Breiman  (1977)  for  comparing 
kernel  and  Parzen  multivariate  density  estimation  techniques.  Wahba  and  Wold  (1975) 
use  mean  squared  error  in  their  presentation. 


2.3  Smoothing  Sulfur  Dioxide  and  Suspended  Particulate  Data  for  Ohio  State  By  Cross- 
Validation 

Sulfur  dioxide  and  suspended  particulate  data  for  the  state  of  Ohio  were  used 
to  illustrate  the  selection  of  an  optimal  d0  by  cross-validation.  From  1974  through  1976 
there  were  185  active  stations  for  sulfur  dioxide  and  398  active  stations  for  total  suspend¬ 
ed  particulate.  For  d0-  1,  2,  5,  10,  20  and  50  km,  the  NNR  method  was  used  to  predict 
the  observed  value  at  the  location  of  each  of  the  active  stations  by  leaving  out  the 
station’s  own  value.  The  difference  between  prediction  and  the  observed  value  is  the 
prediction  error.  We  would  like  to  choose  the  d0  with  the  smallest  composite  prediction 
error  over  all  observed  values. 

The  eight  composite  measures  of  error  (explained  in  section  2.2)  were  com¬ 
puted  for  total  suspended  particulate  and  sulfur  dioxide  data  in  Ohio.  Table  1.  shows  the 
value  of  each  function  for  suspended  particulate  and  different  values  of  d„ .  The  value  of 
d„  which  minimizes  the  loss  for  the  the  majority  of  the  unweighted  error  functions  is  5 
kilometers.  The  RSE  is  the  only  one  which  is  not  consistent  with  this  choice  of  an  op¬ 
timal  d0 .  Whereas,  the  weighted  error  functions  consistently  choose  da  equal  to  2  km. 


TABLE  1.  Error  Functions  by  d0  (km)  for  Suspended  Particulate  in  Ohio  State. 


d0  (km) 

MSE 

RSE 

MAE 

MPE 

WMSE 

WRSE 

WMAE 

WMPE 

l 

.07562 

*.500 

4.9 

4.9 

.0633 

.425 

4.5 

4.4 

2 

.06907 

.539 

4.8 

4.9 

*.0519 

*.404 

*4.2 

*4.1 

5 

*.06586 

.621 

*4.8 

*4.0 

.0531 

.477 

4.3 

4.2 

10 

.06945 

.696 

4.9 

4.9 

.0565 

.533 

4.4 

4.3 

20 

.08111 

.836 

5.3 

7.4 

.0673 

.631 

4.7 

4.6 

50 

.09337 

.963 

5.6 

5.7 

.0836 

.803 

5.7 

5.2 

♦indicates  the  minimum  value  and  thus  corresponds  to  the  optimum  dQ . 


Table  2.  is  the  analogue  to  Table  1.  for  sulfur  dioxide  data.  Note  that  the 
both  weighted  and  unweighted  RSE  is  greater  than  one  fu  all  d0 .  This  means  that  the 
sum  of  squared  error  of  the  estimate  derived  from  the  NNR  method  is  greater  than  the 
sum  of  squared  error  using  the  overall  mean.  In  other  words, 
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Since  the  estimate  contains  only  the  ’nearest’  points  and  the  mean  contains  all  the  points 
in  the  region,  this  could  arise  from  measuring  problems  such  as  a  large  measurement 
error  or  biased  measurements.  It  could  also  be  that  the  NNR  method  is  inherently  wrong 
for  estimating  sulfur  dioxide.  Unlike  suspended  particulate,  sulfur  dioxide  is  a  specific 
chemical  compound  which  may  diffuse  or  transform  to  other  compounds  in  the  atmo¬ 
sphere.  Therefore  SO  2  concentrations  may  vary  more  in  a  smaller  neighborhood  (i.e., 
have  higher  local  variance).  Suspended  Particulate  includes  many  compounds  of  a  certain 
size  (eg.  dust)  which  may  not  be  as  apt  to  change  in  concentration  locally. 
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TABLE  2.  Error  Functions  by  d0  for  Sulfur  Dioxide  in  Ohio  State. 


da  (km) 

MSE 

RSE 

MAE 

MPE 

WMSE 

WRSE 

WMAE 

WMPE 

1 

4.34 

4.42 

57.0 

54.9 

3.49 

3.91 

46.6 

43.1 

2 

2.52 

*3.56 

39.9 

38.2 

1.69 

*2.42 

*27.7 

*25.8 

5 

2.17 

3.98 

38.7 

38.1 

1.63 

2.82 

28.9 

27.1 

10 

2.11 

4.21 

39.8 

39.9 

1.60 

2.93 

30.6 

29.5 

20 

2.00 

4.32 

39.7 

39.8 

1.56 

2.96 

31.1 

29.8 

50 

*1.80 

3.95 

*38.2 

*37.5 

*1.38 

2.68 

*37.5 

29.7 

indicates  the  minimum  value  and  thus  corresponds  to  the  optimum  d0 


3.1  Determining  the  Appropriate  Smoother  by  Estimating  Local  Variance 


The  appropriate  amount  of  smoothing  should  be  such  that  insignificant  vari¬ 
ance  among  adjacent  stations  is  smoothed  out;  and  yet  significant  trends  over  larger  geo¬ 
graphic  areas  are  preserved.  Thus,  perhaps  a  more  direct  method  titan  cross-validation 
for  finding  the  optimal  dQ  or  a  range  of  optimal  d„'s  is  to  construct  a  measure  of  the 
mean  local  variance  (MLV)  as  a  function  of  d0  for  a  specific  area  and  find  the  da  which 
minimizes  it.  That  is,  in  the  NNR  method  the  da  which  is  chosen  should  be  approxi¬ 
mately  that  da ,  corresponding  to  a  distance  from  each  point,  for  which  the  point  values 
within  this  distance  are,  on  the  average,  the  most  homogeneous.  This  section  explores  a 
method  for  measuring  mean  local  variance  as  a  function  of  distance.  The  relationship 
between  da  and  distance  is  explained  below. 

Since  the  weights  are  proportional  to 

aXd,  )  =  e 


points  which  are  further  away  relative  to  da  get  less  weight  than  points  close  to  the 
estimated  point.  Therefore,  for  a  particular  spatial  distribution  of  points.  d„  can  be 


thought  of  as  a  distance  at  which  some  multiple  of  it  makes  the  weights  so  small  that  the 
station’s  information  receives  insignificant  weight.  Furthermore,  only  points  within  four 
times  d0  are  used  in  the  average;  yet  the  points  that  are  a  distance  of  four  times  da  give 
an  absolute  weight  of 

co (4da  )  =  e~*  =  .0003 

which  can  be  a  small  or  large  relative  weight.  Since  the  weights  in  this  method  are  rela¬ 
tive  weights,  a  point  which  is  far  away,  say  2.9  da ,  can  have  a  weight=1.0  if  it  is  the  only 
point  within  3 xd0 .  Therefore  choosing  a  relationship  between  distance  and  d0  for  this 
weighting  scheme  depends  on  the  particular  data  set  and  its  precision.  Tt  is  suggested 
here  to  estimate  the  mean  local  variance  for  a  particular  data  set  as  a  function  of  window 
size  and  choose  the  appropriate  window  size  on  this  basis.  The  relationship  between 
window  size  and  da  is  positive  but  will  vary  among  different  data  sets. 

Local  variance  is  defined  as  the  variance  in  the  data  within  a  small  neighbor¬ 
hood.  Local  variance  is  measured  in  the  following  manner.  A  circular  window  is  drawn 
around  every  point,  defined  by  the  station  location,  in  the  entire  geographic  area  of 
interest.  Thus,  for  every  point  there  is  an  associated  circular  window  for  which  it  is  the 
center.  The  radius  of  each  window  is  chosen  as  a  function  of  d„  and  is  allowed  to  vary 
from  0  to  encircle  the  entire  area.  For  a  large  enough  radius  there  will  be  other  points 
besides  the  center  point  contained  inside  these  circular  windows.  For  each  point  we  can 
calculate  a  mean  and  a  variance  using  the  points  which  lie  in  its  associated  circular  win¬ 
dow  for  a  specified  radius  r.  For  example,  denoting  k{  as  the  number  of  points  within 
z,  's  associated  window  and  zy  as  one  of  those  points  contained  in  the  window  where  j 
ranges  from  1  to  k, ,  the  window  associated  with  z,  has  the  following  local  mean  and  vari¬ 


ance. 
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To  calculate  the  local  variance  of  the  entire  region  for  a  particular  r ,  an  average  of  the 
local  variances  is  taken  and  is  called  the  mean  local  variance.  Local  variance  as  a  func¬ 
tion  of  distance  is  obtained  by  varying  the  size  of  the  circular  window. 

First,  let  us  decide  how  to  weight  the  local  variances  to  find  the  mean  local 
variance  for  a  particular  data  set.  Since  a  point  may  be  in  more  than  one  window  we 
should  weight  the  local  variances  so  that  each  point  contributes  a  weight  of  1  to  the  aver¬ 
age  local  variance  for  the  entire  region.  Each  window  can  be  thought  of  as  having  Jfc,— 1 
"degrees  of  freedom  where  kt  is  the  number  of  points  in  each  circular  window.  If  each 
window  were  independent  of  every  other  window  (i.e.  non-overlapping  groups  of  points) 
then  the  weights  would  be  1  and  mean  local  variance  could  be  measured  by  the 
unweighted  average  of  the  local  variances.  But  since  there  is  apt  to  be  overlap,  each 

squared  difference  associated  with  each  point  is  given  a  weight  of  —  where  n,=  the 

rii 

number  of  circular  windows  containing  point  (jc With  this  weighting,  each  point 
contributes  a  weight  of  1  to  the  mean  local  variance  of  the  region. 

In  order  to  calculate  mean  local  variance  we  need  a  denominator  which 
expresses  the  number  of  independent  data  points  in  the  sum  of  local  variances.  We  will 
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call  this  the  "degrees  of  freedom".  The  "degrees  of  freedom"  should  take  account  of  both 
the  total  number  of  data  points  and  the  amount  of  overlap  in  the  circular  windows.  In 

fact,  if  each  point  contributes  then  the  number  of  non-independent  terms  in  the 

numerator  is  V  — .  Therefore  we  have  chosen  the  denominator  to  be  N-Y  —  because 
/Si  ,Si  «/ 

V  —  is  the  number  of  non-independent  terms  in  the  numerator.  This  gives  the  expected 

i“  ni 

answer  in  a  few  simple  cases  which  are  given  below. 

From  the  above  discussion,  we  have  constructed  the  following  measure. 


Mean  Local  Variance  (MLV) 


*i,  ( Zi-Zj  )2 


mlv  =  J--zl 


where 


IV  =  the  total  number  of  points  in  the  region  of  interest, 
z,  =  the  value  of  pollution  at  the  point  (x,  ,y/ ). 

zj  =  the  mean  value  corresponding  to  the  circular  window  associated  with  zj . 

kj  =  the  number  of  points  enclosed  by  zj ’s  circular  window  of  which  z,-  is  one  of  them. 


Let  us  examine  this  formula  in  light  of  a  few  examples. 


CASE  1.  Every  point  is  in  only  one  circular  window.  In  this  case  n,  =  1  and  thus  the 
denominator  of  MLV  is  zero.  Since  the  radius  is  so  small,  no  local  variances  can  be 
measured.  MLV  is  undefined  in  this  case. 
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CASE  2.  The  radius  is  large  enough  that  every  point  lies  in  every  other  point’s  circular 


window.  Then  n,  =N  and  kj=N  and 


MLV  = 


(Zi-Zj)2 


_  J=U= 


which  is  simply  the  sample  variance  of  the  points  for  the  entire  region.  In  other  words, 
MLV  approaches  the  global  sampling  variance  as  the  radius  increases. 


CASE  3.  Suppose  there  arc  n,  completely  overlapping  circular  windows.  In  other 
words,  there  are  m  discrete  clusters  of  points.  In  this  case  each  cluster  has  its  own  mean 
and  sum  of  squares  associated  with  it.  Therefore  each  point  has  within  a  particular  clus¬ 
ter  the  same  mean,  z}  and  sum  of  squares  associated  with  it  as  any  other  point  in  its  clus¬ 
ter.  In  this  case  «,•  =  kj ,  where  kj  is  the  number  of  points  in  the  jth  cluster,  and 


MLV  = 


k  ( Zi-zj )2 


-  ;=i<= 


li=l  "7 


(Zi-Zj)2 


This  is  similar  to  the  within  mean  square  in  one  way  analysis  of  variance  (Scheffe’, 
1959). 

CASE  3a.  As  a  specific  case  of  m  discrete  clusters,  suppose  each  point  has  only  one 
other  point  in  its  circular  window.  So  there  are  a  series  of  two  overlapping  windows 
which  do  not  contain  any  other  points.  In  this  case  «,•  =  2  for  all  i  and 


MLV  = 


9 

| 
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This  is  case  3  with  two  points  in  each  cluster.  This  generalises  to  the  case  where  every 
point  is  in  k  windows  which  do  not  overlap,  then 


MLV  = 


Since  every  circular  window’s  centered  point  is  in  every  other  point’s  circle 
in  the  above  examples  of  geographically  clustered  points,  k}  =  nf- .  For  any  particular 
(Zj  -Zj)2,  Zj  may  be  contained  in  different  circular  windows  while  z;  may  be  con¬ 
tained  in  kj  and  thus  nr-  and  kj  are  not  necessarily  equal.  In  other  words,  there  may  be 
windows  that  z/  is  contained  in  but  all  the  points  in  those  windows  are  not  necessarily 
contained  in  z,-  ’s  window. 

Mean  Local  Variance  is  similar  conceptually  to  the  variogram  function 
described  in  Joumel  and  Huigbregts  (1978).  The  variogram  describes  variability 
between  two  quantities  separated  by  a  distance  h;  while  mean  local  variance  describes 
the  variability  in  the  data  within  a  radius  h.  Like  the  variogram,  to  estimate  mean  local 
variance  from  the  realizations  of  Z  (x  ,y ),  it  must  be  assumed  that  the  variogram  function 
depends  only  on  h  and  not  on  the  location  of  Z  (x  ,y ).  This  implies  second-order  sta- 
tionarity  which  is  described  in  detail  in  Joumel  and  Huigbregts  (1978).  Although  the 
variogram  function  may  also  be  useful  for  finding  the  appropriate  smoother  or  window 
size  in  NNR,  this  is  not  explored  in  this  paper.  However,  we  will  explore  the  use  of 
mean  local  variance  for  this  purpose  and  the  method  using  the  variogram  is  analogous. 

Mean  local  variance  is  relevant  to  the  choice  of  the  smoothing  parameter  in 
the  NNR  method  described  earlier.  If  d0  is  to  be  large,  then  the  minimum  mean  local 
variance  should  occur  at  a  larger  radius  so  that  relatively  large  weight  is  given  to  points 
relatively  far  from  the  point  of  interest.  In  other  words,  the  homogeneity  of  points 
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encompasses,  on  the  average,  a  larger  area  for  larger  d0 .  Likewise,  if  d0  is  chosen  to  be 
small,  points  far  away  will  be  given  relatively  small  weight  and  therefore  the  minimum 
mean  local  variance  should  occur  at  a  smaller  circle  radius. 

Ohio  is  chosen  as  an  example  so  that  we  can  compare  our  minimum  mean 
local  variance  choice  of  d0  with  our  cross-validatory  choice  of  d0  in  the  previous  section. 
Mean  local  variance  is  calculated  for  various  radii.  For  each  d0 ,  two  associated  radii 
were  chosen.  One  was  4 xd0  which  is  the  same  size  that  the  NNR  method  uses  in  select¬ 
ing  stations  to  compute  its  estimate  of  a  point.  The  other  radius  is  chosen  such  that  the 
co (</,•)  are  equal  to  .01  at  the  edge  of  the  circular  window.  In  Figures  2.  and  3.  the  log 
(base  e)  mean  local  variance  is  plotted  against  the  radius  of  the  circular  windows  for 
Suspended  Particulate  and  Sulfur  Dioxide  respectively.  Logarithms  are  taken  to  magnify 
the  lower  values  of  the  curve.  In  Table  3.  the  actual  values  calculated  for  Ohio  are  given. 
For  comparison  with  the  cross-validatory  choices  of  the  previous  section,  we  chose  win¬ 
dows  corresponding  to  da  =1,2,  5,  10,  20,  and  50  km  for  calculating  MLV.  The  radius 
which  gives  the  most  homogeneous  window  size  corresponds  to  a  d„  =  2  km  by  this 
method.  These  results  are  the  same  as  those  given  by  cross-validation  using  weighted 
loss  functions  and  similar  to  that  obtained  using  the  unweighted  loss  functions  (Table  1.). 
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TABLE  3.  Mean  Local  Variance  (MLV)  as  a  function  of  d0  for  Suspended  Particulate 
and  Sulfur  Dioxide  in  the  State  of  Ohio. 


Suspended  Particulate 


Sulfur  Dioxide 


m) 

radius  (km) 

MLV 

MLV 

N-,U 

T 

3.03 

.0583 

141.80 

.2129 

50.37 

1 

4.00 

.0573 

173.28 

.1934 

66.70 

2 

4.29 

*.0562 

181.37 

.1984 

71.00 

2 

8.00 

.0597 

240.35 

.1930 

101.46 

5 

10.74 

.0614 

264.68 

*1887 

113.73 

5 

20.00 

.0662 

326.96 

.1989 

135.56 

10 

21.47 

.0671 

334.21 

.2008 

137.58 

10 

40.00 

.0819 

372.48 

.2526 

162.65 

20 

42.94 

.0837 

375.20 

.2644 

164.31 

20 

80.00 

.0927 

390.68 

.3166 

177.30 

50 

107.36 

.0945 

393.21 

.3468 

180.41 

50 

200.00 

.0957 

396.03 

.4063 

183.10 

** 

500.00 

.0972 

397.00 

.5069 

184.00 

*  The  smallest  mean  local  variance,  corresponding  to  the  optimal  da . 

**  da  is  that  value  which  gives  a  radius  large  enough  to  contain  all  the  points  in  Ohio. 
Notice  that  "df'=N-l  since  there  are  398  active  stations  for  suspended  particulate  and  185 
active  stations  for  sulfur  dioxide  in  the  region. 


In  section  2.3,  the  cross-validatory  analysis  for  sulfur  dioxide  using  weighted 
loss  functions  suggests  a  d„  =  2  km  while  the  unweighted  analysis  leans  toward  a  da  = 
50  km.  This  implies  that  sulfur  dioxide  is  more  locally  heterogeneous  than  suspended 
particulate.  Minimum  mean  local  variance  analysis  shows  this  is  true.  Table  3.  gives  a 
minimum  at  a  circle  radius  of  10.74  km  which  corresponds  to  d0  =5  km.  Notice  that  this 
solution  creates  no  conflict  of  choice  between  2  d0 ’s  which  are  so  far  apart.  The 
minimum  radius  chosen  will  be  at  least  near  the  global  minimum. 

In  both  figures  2.  and  3.,  MLV  increases  after  a  minimum  area  and  continues 
to  increase  steadily  to  the  global  sample  variance  of  the  entire  region.  Note  the  slight 
instability  of  the  curve  at  smaller  circular  window  size.  This  instability  is  expected  since 
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with  small  kj  in  each  window  we  expect  MLV  to  be  extremely  sensitive  to  the  addition 


of  new  points  with  slight  increments  in  window  size. 


3.2  The  Shape  of  Mean  Local  Variance 


Mean  Local  Variance  was  calculated  for  sulfur  dioxide  and  suspended  parti¬ 
culate  in  the  states  of  New  York  and  Florida.  The  results  are  revealed  in  this  section. 
New  York  and  Florida  were  chosen  because,  out  of  all  states  in  the  United  States,  these 
states  have  the  second  and  third  largest  number  of  monitoring  stations  respectively.  For 
each  state,  the  basic  shape  of  the  curve  showing  mean  local  variance  against  circular  win¬ 
dow  size  is  similar.  Sulfur  dioxide  is  consistently  more  locally  heterogeneous  than 
suspended  particulate  across  all  three  states.  (Tables  3.,  4.  and  5.,  along  with  Figures  2. 
thru  7.) 


TABLE  4.  Mean  Local  Variance  (MLV)  as  a  function  of  dQ  for  Suspended  Particulate 
and  Sulfur  Dioxide  in  New  York  State. 


Suspended  Particulate 


radius  (km)  j^y 


.0484 

.0480 

.0493 

*0481 

.0516 

.0608 

.0631 

.0808 

.0837 

.0974 

.1033 

.1107 

.1168 


Sulfur  Dioxide 


MLV 


202.26 

208.31 

245.42 

248.50 

265.88 

269.99 

273.30 

275.00 


TABLE  5.  Mean  Local  Variance  as  a  function  of  da  for  Suspended  Particulate  and  Sul¬ 
fur  Dioxide  in  Florida  State. 


*  The  smallest  mean  local  variance,  corresponding  to  the  optimal  d0 . 

**  d0  is  that  value  which  gives  a  radius  large  enough  to  contain  all  the  points  in  the 
region. 


States  and  pollutants  have  different  window  sizes  that  give  minimum  mean 
local  variance.  Since  there  are  different  dispersion  and  dilution  mechanisms  in  different 
geographic  areas,  in  addition  to  the  different  spatial  distribution  of  the  samples  we  should 
expect  different  choices  across  states  and  pollutants.  The  table  below  shows  the  radii  at 
which  the  minima  occur  among  states  and  pollutants. 

Radius  (km)  Radius  (km) 

Suspended  Particulate  Sulfur  Dioxide 

Ohio  4.29  10.74 

New  York  8.00  10.74 

Florida  4.00  8.00 

In  Figures  4.  through  7.,  the  line  graphs  are  drawn  to  see  the  shape  of  MLV 
and  there  are  basically  three  regions  on  each  curve.  The  first  region  is  called  the 
Undefined  Region.  This  area  ranges  from  radius  of  0  to  a  radius  large  enough  to  have  at 


least  one  circular  window  containing  two  points.  The  size  of  the  undefined  region  is 
determined  by  the  particular  geographic  area  one  is  studying  and  the  spatial  distribution 
of  stations  in  that  area.  The  second  region  is  called  the  Region  of  No-Precision.  The 
Undefined  Region  is  included  in  the  Region  of  No-Precision.  This  region  corresponds  to 
circular  windows  which  are  too  small  to  have  enough  points  to  measure  local  variance. 
This  area  is  not  specifically  defined  but  can  be  estimated  by  looking  at  the  "degrees  of 
freedom"  and  plots  of  the  logarithm  of  the  radius  vs.  MLV.  The  "degrees  of  freedom" 
specify  the  amount  of  overlap  and  therefore  how  many  points  lie  in  each  other’s 
corresponding  windows.  The  Region  of  No-Precision  is  designated  in  each  of  the  plots, 
yet  the  boundary  of  this  region  is  not  sharp.  The  third  region  is  for  radii  greater  than 
those  in  the  no-precision  region.  Window  sizes  in  this  region  are  large  enough  to  contain 
enough  points  so  that  local  variance  can  be  measured.  It  is  in  this  area  that  the  optimum 
choice  for  d„  exists. 

For  the  sake  of  example,  plots  of  the  "degrees  of  freedom  (df)"  for  Ohio 
state’s  suspended  particulate  against  logarithm  of  the  window  radius  are  given  in  Figure 
8.  Note  that  "df '  increases  quickly  when  radii  are  small  and  then  levels  out  as  they  get 
large  enough  to  include  almost  every  station  in  the  area.  Degrees  of  freedom  are  a  mono¬ 
tonic  function  of  window  size.  The  plot  of  "df'  versus  the  log  of  the  radius  is  an  S- 
shtqjed  curve.  It  is  within  the  steep  part  of  the  S  that  the  minimum  MLV  is  likely  to 
occur. 

It  is  relatively  easy  to  choose  the  optimum  circle  radius  if  MLV  has  the  func¬ 
tional  behavior  shown  in  most  of  these  curves.  In  Figure  4.  it  is  more  difficult  because 
MLV  does  not  increase  steadily  but  bounces  around  initially.  Using  the  method 
described  in  the  previous  paragraph,  we  would  choose  the  minimum,  which  in  this  case. 
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lies  in  the  area  of  no-precision,  since  it  is  the  first  trough-like  area  after  a  rapid  increase. 
This  window  radius  is  only  one  kilometer.  Knowing  something  about  the  spatial  distri¬ 
bution  of  monitoring  stations,  it  is  seen  that  one  kilometer  would  not  allow  many  stations 
to  have  enough  points  contained  in  their  windows  to  measure  local  variance.  With  this 
choice,  the  area  of  no-precision  would  have  to  end  at  a  radius  of  .5  kilometers 

corresponding  to  an  MLV  with  8  "degrees  of  freedom".  This  implies  that  j^l/n,-  =  268. 

Since  there  are  only  276  stations  altogether,  «,•  must  be  1  for  most  stations.  Thus  at  a  cir¬ 
cle  radius  of  .5  kilometers,  there  are  an  insufficient  number  of  points  to  measure  mean 
local  variance  and  thus  it  should  be  considered  imprecise  at  this  point.  Therefore,  care 
must  be  taken  in  the  choice  of  local  minimum.  We  must  closely  examine  the  degrees  of 
freedom  to  see  how  reliable  MLV  is  at  various  circular  window  sizes  before  choosing  the 
minimum. 

Perhaps  there  are  two  ways  one  can  use  mean  local  variance  to  choose  the 
"best"  circular  window  radius  corresponding  to  the  "best"  dQ.  One  method  is  to  choose  a 
point  estimate,  the  minimum  radius  greater  than  the  "no-precision"  area’s  upper  limit. 
This  method  might  not  be  the  best  since  the  MLV  curve  is  data  dependent,  and  therefore 
fluctuates  around  the  minimum  making  it  difficult  to  determine.  Another  involves  choos¬ 
ing  a  region  of  radii  called  the  "Region  of  Homogeneity".  In  each  of  the  curves  of  MLV, 
after  the  region  of  no-precision  there  is  a  trough-like  area  in  which  the  minimum  occurs 
and  then  MLV  starts  to  increase  rapidly.  We  could  call  this  trough-like  area  the  Region 
of  Homogeneity.  Choosing  a  larger  radius  allows  us  to  use  more  points  for  reliability. 
Yet,  we  do  not  want  to  use  insignificant  points.  Thus  choosing  the  largest  possible  radius 
before  MLV  starts  increasing  rapidly  towards  the  global  sample  variance  may  be  another 
acceptable  method.  In  other  words,  one  should  not  take  any  points  into  the  estimate 
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which  are  outside  the  area  of  homogeneity.  Which  point  is  chosen  as  giving  the  most 
homogeneous  window  size  is  not  critical  for  estimation.  However,  it  is  important  to 
know  which  window  sizes  not  to  use,  and  those  sizes  can  be  determined  easily  by  investi¬ 
gating  MLV  and  its  relationship  with  circular  window  size. 


4.  Conclusions 


A  non-parametric  regression  method  has  been  used  to  estimate  pollution 
concentrations  for  particular  geographic  areas.  To  determine  the  best  value  of  the 
smoothing  parameter  d0 ,  cross-validation  techniques  were  used.  Cross-validation  has 
been  used  as  a  method  for  finding  the  appropriate  level  of  smoothing  by  the  work  of 
Wahba  (1975)  and  others.  Use  of  the  concept  of  mean  local  variance  (MLV)  is  explored 
as  an  alternative  means  of  finding  d0  .  An  example  of  the  MLV  approach  has  been  given 
here  using  these  data.  Further  work  would  include  simulations  of  Z  (x  ,y )  for  finding  the 
distributional  properties  of  mean  local  variance. 

Minimum  mean  local  variance  analysis  could  also  be  used  for  choosing 
parameters  in  other  methods  of  spatial  estimation.  For  example,  mean  local  variance  can 
be  plotted  as  a  function  of  the  number  of  nearest  neighbors  or  the  size  of  the  partition  and 
thus  the  minimum  mean  local  variance  could  correspond  to  the  optimum  number  of 
neighbors  or  partition  size  in  the  same  way  it  does  da . 

Mean  Local  Variance  is  also  of  interest  in  its  own  right.  MLV  can  be  useful 
for  comparing  geographic  regions,  spatial  sampling  distributions  and  different  pollutant’s 
dispersion  mechanisms. 
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FIGURE  2.  Mean  Local  Variance  vs.  Logarithm 
For  Suspended  Particulate  in  Ohio. 


FIGURE  3.  Mean  Local  Variance  vs.  Logarithm  of  Radius 

for  Sulfur  Dioxide  in  Ohio. 
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FIGURE  4.  Mean  Local  Variance  vs  Log  of  Radius 
for  Suspended  Particulate  in  NY  state. 
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FIGURE  5.  Mean  Local  Variance  vs  Log  of  Radius 
For  Sulfur  Dioxide  in  NY  State. 
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FIGURE  7.  Mean  Local  Variance  vs  Log  of  Radius 
For  Sulfur  Dioxide  in  Florida  State 
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FIGURE  8.  Degrees  of  Freedom  vs  Logarithm  of  Circle  Radius 
For  Suspended  Particulate  in  Ohio. 
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